Discovering Ontology Functional Dependencies

نویسنده

  • Jaroslaw Szlichta
چکیده

Functional Dependencies (FDs) are commonly used in data cleaning to identify dirty and inconsistent data values. However, many errors require user input for specific domain knowledge. For example, let us consider the drugs, Advil and Crocin. FDs will consider these two drugs different because they are not syntactically equal. However, Advil and Crocin are synonyms as they are two different drugs with similar chemical compounds but marketed under distinct names in different countries. While FDs have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., synonym, Is-A (Inheritance)) defined by ontologies. In this thesis, we take a first step to discover a new dependency called Ontology Functional Dependencies (OFDs). OFDs model attribute relationships based on relationships in a given ontology. We present two effective algorithms to discover OFDs using synonyms and inheritance relationships. Our discovery algorithms search for minimal OFDs and prune the redundant ones. Both algorithms traverse the search lattice in a level-wise Breadth First Search (BFS) manner. In addition, we have developed a set of pruning rules so that we can avoid considering unnecessary candidates in the search lattice. We present an experimental study describing the performance iv and scalability of our techniques. Experimental results show that both algorithms are effective in practice and discover OFDs efficiently for large datasets with millions of tuples. We also present a qualitative study showing that the discovered OFDs are meaningful with high precision and recall.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ontology Functional Dependencies

We extend traditional functional dependencies (FDs) for data quality purposes to accommodate ontological variations in the attribute values. We begin by formally defining a novel class of dependencies called ontological FDs, which strictly generalize traditional FDs by allowing differences controlled by an ontology database. The ontology databases contain information about synonyms. We then foc...

متن کامل

Discover Dependencies from Data - A Review

Functional and inclusion dependency discovery is important to knowledge discovery, database semantics analysis, database design, and data quality assessment. Motivated by the importance of dependency discovery, this paper reviews the methods for functional dependency, conditional functional dependency, approximate functional dependency and inclusion dependency discovery in relational databases ...

متن کامل

Efficient Discovery of Functional Dependencies and Armstrong Relations

In this paper, we propose a new efficient algorithm called Dep-Miner for discovering minimal non-trivial functional dependencies from large databases. Based on theoretical foundations, our approach combines the discovery of functional dependencies along with the construction of real-world Armstrong relations (without additional execution time). These relations are small Armstrong relations taki...

متن کامل

Discovering Functional Dependencies in Pay-As-You- Go Data Integration Systems

Functional dependency is one of the most extensively researched subjects in database theory, originally for improving quality of schemas, and recently for improving quality of data. In a payas-you-go data integration system, where the goal is to provide best-effort service even without thorough understanding of the underlying domain and the various data sources, functional dependency can play a...

متن کامل

Discovering Functional Dependencies and Association Rules by Navigating in a Lattice of OLAP Views

Discovering dependencies in data is a well-know problem in database theory. The most common rules are Functional Dependencies (FDs), Conditional Functional Dependencies (CFDs) and Association Rules (ARs). Many tools can display those rules as lists, but those lists are often too long for inspection by users. We propose a new way to display and navigate through those rules. Display is based on O...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016